feat(renderer): expose preserve_*_thinking flags in RendererConfig#2436
Merged
Conversation
Adds two new fields to RendererConfig:
- preserve_all_thinking
- preserve_thinking_between_tool_calls
The orchestrator forwards them in two places so train and infer stay
consistent:
1. create_renderer() — bound at construction on the training-side
renderer used by build_trajectory_step / render_ids.
2. setup_inference_pool() → setup_clients() → vf.ClientConfig — the
verifiers RendererClient picks them up and forwards to its
create_renderer_pool, so every inference render carries the same
thinking-preservation behaviour.
Both flags are off by default → zero behaviour change for existing
configs. Setting either without orchestrator.use_renderer=True is
rejected by validate_renderer_args, matching the existing renderer
knobs.
Bumps source pins:
- verifiers 3b77145 → a7516a1 (PR #1298: ClientConfig propagation)
- renderers (now a separate repo): pinned to fe67f9f (PR #4:
construction-time preserve flags)
Re-applies #2433, which was merged into feat/unify-inference-generate
by mistake.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
cc381eb to
2544d0c
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 2544d0c. Configure here.
The flags were added to the shared RendererConfig but only wired up in the orchestrator. SFT also constructs a renderer via create_renderer for training-side tokenization, so it must forward both flags or silently ignore them. Also tighten validate_renderer_args to reject either flag when use_renderer=False, matching the orchestrator validator. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
samsja
approved these changes
May 17, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Summary
Re-applies #2433 onto
main. The original PR landed againstfeat/unify-inference-generateand was merged there by mistake.Adds two new fields to
RendererConfig:preserve_all_thinking: bool = Falsepreserve_thinking_between_tool_calls: bool = FalseBoth RL and SFT forward the flags to keep tokenization byte-consistent:
create_renderer()— bound at construction on the training-side renderer used bybuild_trajectory_step/render_idsfor trajectory tokenization.setup_inference_pool()→setup_clients()→vf.ClientConfig— the verifiersRendererClientpicks them up and forwards to itscreate_renderer_pool, so every inference render carries the same thinking-preservation behaviour.create_renderer()— same construction-time binding on the training renderer, so SFT tokenization matches the configured preservation policy.Both flags are off by default → zero behaviour change for existing configs. Both the orchestrator and SFT
validate_renderer_argsvalidators reject either flag whenuse_renderer=False, matching the existing renderer knobs.Per-flag semantics
preserve_all_thinking— emitreasoning_contentfor every past-assistant turn, even those before another user message.preserve_thinking_between_tool_calls— emitreasoning_contentonly for the current tool cycle (post-last-user assistant→tool→…→assistant block). Older blocks fall back to template default. Strict subset ofpreserve_all_thinking.Example config
Dependencies
No
pyproject.toml/uv.lockchanges. After the monorepo split in #2507,verifiersandrenderersare uv workspace members vendored underdeps/. Main's current submodule pins already include the upstream changes this PR depends on:deps/verifiers@711a7c7— 76 commits ahead of the verifiers PR (Run dir? #1298) that addedpreserve_*_thinkingtoClientConfig.deps/renderers@87084dc— 36 commits ahead of the renderers PR (Data streaming #4) that added construction-timepreserve_*_thinkingtocreate_renderer.File-path delta vs #2433
The original PR targeted
feat/unify-inference-generatewhere configs live undersrc/prime_rl/configs/. Onmainthey live underpackages/prime-rl-configs/src/prime_rl/configs/(workspace-package split landed in #2416). The diff is otherwise identical.🤖 Generated with Claude Code
Note
Medium Risk
Adds new renderer configuration knobs and threads them through orchestrator/inference client construction; while default-off, it touches rollout/inference plumbing and bumps
verifiersplus adds a newrenderersdependency, which could affect runtime compatibility.Overview
Exposes two new
RendererConfigbooleans (preserve_all_thinkingandpreserve_thinking_between_tool_calls) and extends orchestrator validation to reject them unlessorchestrator.use_renderer=true.When the renderer client is enabled, these flags are forwarded both to
create_renderer()and throughsetup_inference_pool()/setup_clients()intovf.ClientConfig(guarded so non-renderer clients don’t receive unknown extras), keeping training-side tokenization and inference rendering consistent.Updates dependencies by adding a direct
renderersgit dependency and bumping the pinnedverifiersrevision; unit tests are adjusted to assert the new parameters are passed through.Reviewed by Cursor Bugbot for commit cc381eb. Bugbot is set up for automated code reviews on this repo. Configure here.